Chromatin Immunoprecipitation Sequencing ◾ 245
programs will create a control dataset by shuffling each of the sequences in the primary
input dataset.
The following DREME command will search for motif in the FASTA sequences of the
three ChIP-Seq samples. However, because the process may take a long time and this is just
a practice, you can run the command for a single sample only to save time. Run the com-
mands from inside “motifs” directories, where FASTA files are found.
dreme -verbosity 2 \
-oc dreme_motifs_chip1 \
-dna \
-p chip1_peaks.fasta \
-t 14400 \
-e 0.05
dreme -verbosity 2 \
-oc dreme_motifs_chip2 \
-dna \
-p chip2_peaks.fasta \
-t 14400 \
-e 0.05
dreme -verbosity 2 \
-oc dreme_motifs_chip3 \
-dna \
-p chip3_peaks.fasta \
-t 14400 \
-e 0.05
The “-oc” specifies the output directory, “-dna” specifies the type of sequence, “-p” specifies
the primary dataset, “-t” specifies an elapsed time as a stopping criterion, and “-e” specifies
the E-value threshold.
The output files will be saved in the directories “dreme_motifs_chip”. The motifs are
reported in an HTML file, an XML file, and a text file. You can open each of these files by
using the right program. You can change into each of the output directory and display the
HTML file using Firefox as follows:
firefox dreme.html
Figure 6.19 shows the motifs as displayed on the HTML file. The figure shows motif
sequence, logo, RC logo (reverse complement logo), and E-value. The motif sequence logo
is a graphical representation of the sequence conservation of DNA nucleotides. A DNA
sequence logo consists of the four nucleobase letters A, C, G, and T at each position. The
relative sizes of the letters reflect their frequency in the aligned sequences. The sequence
of the motif uses the IUPAC codes for nucleotides for representing each of the 15 possible
combinations as shown in Table 6.2.